second cluster
Clustering in hyperbolic balls
Jaćimović, Vladimir, Crnkić, Aladin
The idea of representations of the data in negatively curved manifolds recently attracted a lot of attention and gave a rise to the new research direction named {\it hyperbolic machine learning} (ML). In order to unveil the full potential of this new paradigm, efficient techniques for data analysis and statistical modeling in hyperbolic spaces are necessary. In the present paper rigorous mathematical framework for clustering in hyperbolic spaces is established. First, we introduce the $k$-means clustering in hyperbolic balls, based on the novel definition of barycenter. Second, we present the expectation-maximization (EM) algorithm for learning mixtures of novel probability distributions in hyperbolic balls. In such a way we lay the foundation of unsupervised learning in hyperbolic spaces.
A Note On $k$-Means Probabilistic Poverty
Kleinberg [2] coined the term of k -richness of distance-based clustering algorithms, meaning the possibility to partition a set of objects into any k nonempty (disjoint) subsets via modifying the distances between these obje cts. However, there exist non-deterministic, probabilistic algorithms which do not fi t this characterization because of non-deterministic behaviour. Therefore Ackerman at el [1, Definition 3 ( k -Richness)] introduce the concept of probabilistic k -richness. This kind of richness they defined as Property 1. For any partition Γ of the set X consisting of exactly k clusters and every ǫ 0 there exists such a distance function d that the clustering function returns this partition Γ with probability exceeding 1 ǫ . They postulate in their Fig.2 (omitting the proof) that probabilistic k -richness in probabilistic sense is possessed by version of the k -means